Genetic association analysis in sugarcane (Saccharum spp.) for sucrose accumulation in humid environments in Colombia

Background Sucrose accumulation in sugarcane is affected by several environmental and genetic factors, with plant moisture being of critical importance for its role in the synthesis and transport of sugars within the cane stalks, affecting the sucrose concentration. In general, rainfall and high soil humidity during the ripening stage promote plant growth, increasing the fresh weight and decreasing the sucrose yield in the humid region of Colombia. Therefore, this study aimed to identify markers associated with sucrose accumulation or production in the humid environment of Colombia through a genome-wide association study (GWAS). Results Sucrose concentration measurements were taken in 220 genotypes from the Cenicaña’s diverse panel at 10 (early maturity) and 13 (normal maturity) months after planting. For early maturity data was collected during plant cane and first ratoon, while at normal maturity it was during plant cane, first, and second ratoon. A total of 137,890 SNPs were selected after sequencing the 220 genotypes through GBS, RADSeq, and whole-genome sequencing. After GWAS analysis, a total of 77 markers were significantly associated with sucrose concentration at both ages, but only 39 were close to candidate genes previously reported for sucrose accumulation and/or production. Among the candidate genes, 18 were highlighted because they were involved in sucrose hydrolysis (SUS6, CIN3, CINV1, CINV2), sugar transport (i.e., MST1, MST2, PLT5, SUT4, ERD6 like), phosphorylation processes (TPS genes), glycolysis (PFP-ALPHA, HXK3, PHI1), and transcription factors (ERF12, ERF112). Similarly, 64 genes were associated with glycosyltransferases, glycosidases, and hormones. Conclusions These results provide new insights into the molecular mechanisms involved in sucrose accumulation in sugarcane and contribute with important genomic resources for future research in the humid environments of Colombia. Similarly, the markers identified will be validated for their potential application within Cenicaña’s breeding program to assist the development of breeding populations. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-024-05233-y.


Background
Sugarcane (Saccharum spp.) is a major agronomic and industrial crop used to produce sugar, ethanol, and electricity worldwide and represents an important component of the economy of tropical and subtropical countries [1,2].Colombia ranks 14th among the world's largest sugar producers and 12th among the largest exporters, accounting for 1.1% of the world's sugar trade [3].Given the economic importance of the crop, breeding programs around the world have directed their efforts in producing genotypes with high biomass (i.e., tons of cane per hectare or TCH), high sucrose content, and resistance to the most limiting diseases of the crop [4][5][6].In general, this process involves making decisions based on the phenotypic information collected during the early stages of selection (three stages in most breeding programs) and multi environmental trials (METs), which provide information on genotype adaptability to the environment or to the target population of environments [4][5][6][7].
In Colombia, the sugarcane breeding process is carried out by the Colombian sugarcane research center, Cenicaña, which has been releasing varieties specifically adapted to the agroecological conditions of the Cauca River Valley since 1990.Recently, Cenicaña´s breeding program seeks to make its process more efficient by incorporating molecular markers.These markers have been used mainly with a genome wide association study (GWAS) and with a genomic prediction strategy trying to assist the selection of the best genotypes for complex traits like sucrose content, water stress, biomass production, among several others.In specific, GWAS have allowed an accurate and rapid identification of genes associated with traits of interest within sugarcane breeding programs around the world [5,8].For instance, using AFLP, DArT, and SSR technologies, six markers were found to be associated with the resistance to yellow leaf virus (SCYLV) in sugarcane [9] and six with the sugarcane brown rust resistance [10].In Argentina, a total of 43 DArT markers were significantly associated with biomass (kg Plot − 1 ) and 38 with sugar content in a panel of 88 sugarcane clones [11].These markers have also been used as part of a genomic selection strategy by incorporating them within the model training phase at the Estación Experimental Agroindustrial Obispo Colombres (EEAOC) Argentina [12].Similarly, for stalk diameter, leaf width, leaf length, stalk number, internode length, brix, total weight, dry weight, and water content, a total of 217 SNP-type markers were significantly associated in a panel of 308 sugarcane clones in the USA [13].They also found that from the 217 SNP-type markers, ten were involved in sugar metabolism, specifically with synthases, hydrolases, and transferases [13].Finally, the marker G1 has been used as a molecular marker associated with the resistance to the sugarcane orange rust [14].This marker had the ability to predict 65.8% of resistant phenotypes in the original mapping population [14] and 71.43% in a collection of resistant Brazilian cultivars [15] and therefore, it has been used within the breeding schemes in Brazil.
Sucrose content plays an important role in the genetic improvement of crops.This trait depends on various genetic factors, such as enzymes (i.e., sucrose synthase, sucrose phosphate synthase, and invertase), sucrose transporters, environmental factors, and biological processes (i.e., phosphorylation, hydrolysis, and regulatory mechanisms) [16].For the accumulation of sucrose, the main enzymes reported are sucrose phosphate synthase (SPS), invertase, and sucrose synthase (SUS) [16].SPS is involved in sucrose synthesis and accumulation, controlling the flux of carbon into sucrose and the movement of photosynthates from source to sink tissues [17][18][19].The invertase enzymes generate a concentration gradient from source to sink by irreversibly hydrolyzing sucrose and producing an equimolar mixture of glucose and fructose necessary for cell elongation [17][18][19][20].Sucrose synthase (SUS) is a glycosyltransferase enzyme involved in the reversible conversion of sucrose into fructose and UDP-glucose, both of which are required for respiration, starch biosynthesis, and fiber development [21,22].
Sucrose accumulation in sugarcane is influenced by the crop cycle and variety planted [6].For instance, it has been reported a difference between the plant cane and the ratoons, with the accumulation in the ratoons commonly being higher than the accumulation in the plant cane.However, even when the plant cane tends to have a lower sucrose (%Cane) than the ratoons, the difference in the varieties tend to remain stable.For instance, under the conditions of China the varieties with a high, intermediate, and low sucrose content have kept his rank throughout the plant cane, first, and second ratoon [6].Similar behavior has been observed in Colombia where the curve of sucrose (% Cane) across plant cane and first ratoon have the same tendency between different varieties regardless of the crop cycle [23].Additionally, when analyzing the commercial database for the 4 most planted varieties in the Cauca River Valley, Colombia, the sugar yield was stable across the crop cycles with 11.00 ± 0.62, 11.05 ± 0.61, and 11.04 ± 0.57 (%Cane) for plant cane, first, and second ratoon, respectively, suggesting that the differences between genotypes tend to be stable throughout the crop cycles even when the accumulation of sucrose was slightly higher in the first ratoon.This behavior could be attributed to the climatic conditions in Colombia.The agro-industrial sector of sugarcane in Colombia is located mainly in the Cauca River Valley, a region with a well-defined bimodal pattern with two rainy and two dry seasons [24], in which the harvest is carried out throughout the year without zafra conditions [25].To improve the agronomic practices for the crop, in 2001, Cenicaña classified the fields of the region within 51 agroecological zones that integrate soil texture, climate, and water resources [26].These agroecological zones have allowed the classification of the region within the semidry, humid, and foothill environments, which help to improve the biomass and sugar content for the varieties planted, reducing the difference between the crop cycles for biomass and sucrose content.
Some studies have concluded that the accumulation of sucrose in sugarcane depends on different environmental factors, with soil moisture being the most important during the ripening process [27,28].During this process, a delay in plant growth is generated when a decrease in the soil and plant moisture is observed [6,8,28], as well as a decline in stalk fresh weight due to dehydration [29].The sucrose content, expressed in percent cane (%Cane), is calculated based on fresh weight, which translates to a high variability linked to the soil moisture or daytime stomatal activity [30,31].Under a dry season, the decrease in internal humidity reduces sugar consumption for growth, and therefore, an increase in sucrose synthesis due to the conversion of reducing sugars into sucrose is produced [5,28].In contrast, during the rainy season or under high soil moisture, there is high vegetative growth and low accumulation of sucrose [32].Brazil have reported an exponential relationship between the rainfall observed 120 days before harvesting and the total recoverable sugars, suggesting that when the precipitation is above 100 mm, a gradual decline in the total recoverable sugars is observed, mainly in late varieties [32].In Colombia, a similar pattern was found with the southern zone of the Cauca River Valley, classified as a humid environment [25,33].Given that increasing the sucrose content in areas with high humidity can be a challenge due to the impact of environmental conditions on the plant's physiology and sugar accumulation, this study aimed to identify molecular markers and candidate genes associated with sucrose accumulation and/or production in a diverse population in the humid environments of the Cauca River Valley, Colombia.A schematic representation of the analysis and results is presented in Fig. 1.

Sequencing data
A total of 51.27, 458.74, and 7,012.48GB data was obtained after sequencing the 220 genotypes from the diverse panel with GBS, RADSeq, and WGS, respectively.The GBS data (with an average depth of 105X) had 2.97 ± 1.49 million reads per sample on average, with a read length of 71.50 ± 1.40 base pairs.For the paired-end RADSeq (with an average depth of 27.0X) and WGS (with an average depth of 39X), there were 12.11 ± 2.69 and 108.00 ± 133.40 million reads per sample on average, respectively.Similarly, for RADSeq and WGS the read length was higher with 87.50 ± 3.10 bp and 138.00 ± 3.40 bp, respectively.After quality control, the reads from each sequencing technology were mapped to CC 01-1940 sugarcane reference genome [34] with an average mapping percentage of 71.30, 30.50, and 85.10 for GBS, RADSeq and WGS, respectively.Subsequently, the aligned data from each technology were merged within a consensus SNP panel, resulting in a total of 137,889 highquality SNPs used for further analysis (Fig. 1).

Phenotypic analysis
Based on Bonferroni-adjusted p-values, 3 and 24 data points were identified as outliers for early (10 months after planting) and normal maturity (13 months after planting), respectively (data not shown).Outliers are defined as data points that fall outside of the majority of the data for a particular subject and can mask the real distribution of the data [35,36].Because of this, outliers are commonly identified and removed from the analysis to reduce the impact on the estimation process of the traits of interest [35].The accumulation of sucrose (%Cane) showed a continuous normal distribution with a mean of 12.64% at early maturity and 14.37% at normal maturity.For normal maturity, the best-fitting model includes all random effects, while for early maturity, a Fig. 1 A graphical abstract of the methodology, analysis, and results presented in this study.The analysis of phenotypic and genotype data (consensus panel) is presented following a linear mixed model.The results of the association analysis (77 associated markers) are subsequently shown.Finally, the 82 genes identified as candidate genes are presented, whose function is involved in the accumulation and/or production of sucrose model excluding the genotype for crop cycle interaction was selected (Table 1).Similarly, for normal maturity, the model allows for heterogeneous residual variance across crop cycles ( V ( ijkl ) = σ 2 e(i) ), while for early maturity, the residual variance was homogeneous (Table 1).Broad sense heritability was 0.90 and 0.83 for early and normal maturity, respectively.The higher heritability values observed in this study are an indication of the high data quality and the good experimental design implemented, both of which help in reducing residual (σ 2 e ) variance.For early maturity, the genotypic effect contributed a significant proportion of the total variance with a value of 2.58, while for normal maturity, the residual variance had a larger effect with 2.94 (Table 2).

Population structure
The 220 genotypes were classified into four subpopulations (Fig. 2).There were six genotypes (S18, S46, S78, S80, S170, and S171) with a posterior probability indicating that they belong to two or more subpopulations.For these cases, the genotypes were assigned to the subpopulation to which they had the highest membership probability.The first and second subpopulations were mainly composed of hybrids, while the third grouped the genotypes from S. spontaneum, S. officinarum, S. sinense, S. barberi, and some interspecific genotypes (Fig. 2).The fourth subpopulation had only the genotypes S1, S132, and S3, all from the genus Erianthus spp.(Fig. 2).

Association analysis and candidate genes
For early and normal maturity, there were 237 and 183 markers associated with sucrose (%Cane), respectively, with the general model having the highest number of markers for both maturities (Table 3).After removing the most significant markers and reanalyzing the general model, a total of 192 and 103 markers were identified as false positives for early (Fig. 3) and normal maturity (Fig. 4), respectively.There were 4 markers identified as false positives for the 2-dom-ref model at early maturity, while 11 markers were false positives for the 1-dom-ref model at normal maturity (Table 3).For the other genetic models, no false positives were identified (Table 3).After this filtering process, a total of 109 markers, 41 for early maturity and 69 for normal maturity, were retained for further analysis (Table 3, Additional file 2).The general model allowed the identification of the highest number of markers for both early (Fig. 5) and normal (Fig. 6) maturity.When analyzing the associated markers per  genetic model, a total of 6 (1_33380771, 10_15141188, 4_10932960, contig_39315_31775, contig_50499_5072, and contig_65540_9617) and 8 (1_63454860, 3_1957031, 4_49815442, 4_55115204, 8_13879677, contig_39799_65, contig_40813_18644, and contig_50499_5072) were found associated between two or more genetic models for early and normal maturity, respectively.For these cases, the model in which the marker has the highest R 2 was selected.Therefore, there were 77 markers, 16 for early, 45 for normal, and 16 shared between both maturities (Table 4).Finally, the highest percentage of phenotypic variation explained by each of the 77 markers (R 2 ) was 25.39% (Table 4).

Candidate genes
A total of 4757 genes were found within the LD region of the 77 markers, with 1695 having a known function in plants.From the 1695 candidate genes, 82 were associated with sucrose accumulation and/or sucrose Fig. 2 Population structure analysis of 220 sugarcane genotypes based on 137,889 SNPs.The purple and magenta branches correspond to subpopulations 1 and 2, respectively.The cyan branches refer to subpopulation 3, and the yellow branches refer to subpopulation 4.There are two main branches, the first for subpopulations 1 and 2 (purple and magenta), and the second includes subpopulations 3 and 4 (cyan and yellow).The image was created by the authors using the information deposited in data availability production and were within the LD region of 39 of the 77 associated markers (Table 5).For the remaining 38 markers, there were no candidate genes with an annotated function related to sucrose accumulation and/or production, and therefore they were not considered for further analysis.From the 82 candidate genes, eight were involved in sucrose transport, seven were involved in starch and sucrose metabolism in Sorghum sp [37][38][39]., and 67 were glycosyltransferases, glucosidases, hormones, and transcription factors (Table 5).

Phenotypic analysis
Sucrose (%Cane) accumulation at early maturity was found to be 1.73% lower than the accumulation at normal maturity.These results were similar to those reported in Ethiopia [40], Egypt [41], and Brazil [42], where increased age at harvest increases sucrose content.Similar results were observed in a breeding population of 100 varieties which represent four improved generations spanning a 20 year node in China [6].The differences in sucrose content between both ages are a consequence of physiological changes in the plant during its rapid vegetative growth phase (between 4 and 9 months after planting), where the photoassimilates are mainly used for cell elongation [17,43].On the other hand, during the maturation phase (between 9 and 13 months after planting), there is a decrease in the concentration of reducing sugars, invertases, and sucrose synthase (SuSy) and an increase in sucrose phosphate synthase (SPS) [41,[44][45][46].Invertase activity is elevated in the upper internodes during the phase of rapid vegetative growth, but it starts to decrease when the maturation phase begins [20].This balance between the enzymes of synthesis (i.e., SPS) and hydrolysis of sucrose (i.e., SuSy, reducing sugars, and invertases) leads to an increase in the accumulation of sugars in the sink organs because of a reduction in the demand for growth in the meristematic tissues [28,47].
The variation in sucrose accumulation (%Cane) is assumed to result from genotypic effects and the interaction between the genotype and crop cycle [48].For normal maturity, there was a heteroskedastic residual variance, with plant cane having the highest residual ( σ 2 e(P C) = 2.94 ) and the first ratoon having the lowest ( σ 2 e(F R) = 0.68 ) (Table 2).Similar results were reported in South Africa and Louisiana, where residual variance was the largest source of phenotypic variation for sucrose during plant cane [49][50][51].Higher residuals in plant cane could be explained by poor, less-established root systems in comparison to the ratoons, where the plants have strong and well-established root systems, facilitating the         establishment, germination, development, and growth of the plant [52].

Population structure
The 220 genotypes were classified into four subpopulations (Fig. 2).The first and second subpopulations were composed of varieties mainly bred in Colombia.[53].The third subpopulation had the S. spontaneum genotype grouped with S. officinarum, S. sinense, S. barberi, and some interspecific genotypes.This grouping is consistent with the evolutionary relationships of the genus Saccharum spp., where the S. spontaneum has contributed with nearly 39% of the genome for the species S. sinense and S. barberi, and between 15 and 27.5% of the genome of modern cultivars [53][54][55][56][57][58].The fourth subpopulation had only the genotypes S1, S132, and S3, all of them from the genus Erianthus spp.(Fig. 2).The genus Erianthus is one of the most closely related genera to Saccharum spp.and has been used mainly to increase biomass, vigor, ratooning ability, and tolerance to drought and waterlogging stresses [59].
Similarly, Erianthus spp.along with Saccharum spp., Miscanthus spp., Miscanthidium spp., Pseudosorghum spp., Narenga spp., and the trans-Himalayan species make up the Saccharum complex, which may be involved in the origins of cultivated sugarcane [59][60][61].Finally, the 220 genotypes were divided into two main branches, one grouping the modern genotypes (light blue and magenta in (Fig. 2) and the other the wild species and interspecific crosses (Fig. 2), suggesting a narrow genetic background for the cultivars held at Cenicaña's germplasm bank.

Association analysis and candidate genes
Sucrose accumulation in sugarcane is a complex process that includes sucrose synthesis in the source tissues, transport of sugars from source to sink, energy generation, and sugar storage [62,63].In this study, a total of 82 candidate genes involved in sucrose accumulation and/ or production were found within the LD region of 39 of the 77 associated markers.For the remaining 38 SNPs no candidate genes associated with sucrose production and/or accumulation were identified and for that were not considered for this analysis.From the 82 candidate genes, four key enzymes were identified in the process of sucrose synthesis: sucrose synthase 6 (SUS6), near the marker 4_55767476, insoluble beta-fructofuranosidase 2 C isoenzyme 3 (CIN3), near the marker 1_17142404, alkaline/neutral invertase (CINV2) and Cytosolic invertase 1 and (CINV1) near the marker 5_9031549 (Table 5).
The markers 4_55767476 and 1_17142404 were significantly associated at normal maturity, with negative effects (Table 4), while the marker 5_9031549 was significantly associated at early maturity with positive effects (Table 4).Sucrose synthase has implications for cell metabolism, the production of metabolites, and the production of cell wall precursors (UDP-glucose) [64].SUS6 catalyzes the reversible conversion of sucrose to UDPglucose and fructose for various metabolic pathways [65], while the enzyme CIN3 cleaves the terminal nonreducing beta-fructofuranoside residues [66].CINV2 can regulate sugar-mediated root development by controlling sucrose catabolism in root cells, while CINV1 participates in osmotic stress-induced inhibition of lateral root growth by controlling the concentration of hexose in the cells [64,67].These enzymes play a central role in the sucrose accumulation process, since sucrose is hydrolyzed by sucrose synthase or invertase in sink tissues and later used for cell growth, development, or sugar storage in the plant [21].The negative effect for these markers suggests that the presence of these two enzymes affects sucrose accumulation by hydrolyzing this disaccharide into glucose and fructose, which are then absorbed in the sink tissues for consumption in the process of cell growth across the plasma membrane [68].
The second process of importance in sucrose accumulation is the transport of sugars from source to sink.Within this process, two plant gene families were found: sucrose transporters (SUTs) and monosaccharide transporters (MSTs) [69].For early maturity, MST2, the plastidic glucose transporter At1g05030, and SUT4 were found within the LD region of the markers 1_33380771, 2_29103764, and 4_55115204, respectively (Table 5).At normal maturity, the candidate genes monosaccharide-sensing protein 2 MSSP2, polyol transporter 5 PLT5, and sugar-phosphate/phosphate translocator At1g53660 (GTP2) were found near the markers 4_15026970, 10_26811308 and 4_28389065, respectively (Table 5).On the other hand, the markers 1_61256982 and 9_33845426, shared in both maturations, close to the candidate genes MST and ERD6-like respectively.The genes MST1, MST2, GTP2, At1g05030, and PLT5, are all directly involved in the transport of monosaccharides, required compounds for various processes of plant growth, development [68], and osmotic adjustments (e.g., monosaccharide homeostasis) [70].The ERD6-like gene     [71].SUT4 is involved in the transport of disaccharides, especially sucrose, the main photoassimilate transported from the source organ to the sink through the phloem [72,73].
In this study, the markers 4_55115204, 4_15026970, and 10_26811308, within the LD region of the sugar transporters (Table 5), showed a positive effect (Table 4), that is, the presence of these markers appeared to contribute to an increase in sucrose content in the plant.On the other hand, the markers close to the candidate genes MST2 (1_33380771) and At1g05030 (2_29103764) had a negative effect (Table 4).This is to be expected given that at 10 months (early maturity), the plant is in transition from the rapid growth phase to the maturation phase, using the hexoses present in the plant mainly for vegetative growth and not for accumulation [74].
The candidate genes Alpha-2 C-alpha-trehalosephosphate synthase 6 (TPS6), Alpha-2 C-alphatrehalose-phosphate synthase 7 (TPS7), Alpha-2 C-alpha-trehalose-phosphate synthase 11 (TPS11) were found near the markers 2_29103764, 3_43581242, and 4_49815442, respectively (Table 5).These genes are involved in various phosphorylation processes, such as trehalose synthesis and glycolysis, which can regulate sucrose accumulation by acting on the expression of genes that code for carbohydrate metabolism and other metabolic enzymes [75].For example, the candidate gene TPS, found at both maturities, is involved in the synthesis of trehalose-6-phosphate (T6P), which links growth and development to carbon status by exerting a negative feedback regulation on sucrose levels [76,77].However, the pyrophosphate-fructose 6-phosphate 1-phosphotransferase alpha (PFP-ALPHA), Glucose-6-phosphate isomerase 2 C cytosolic (PHI1) and hexokinase 3 (HXK3) genes, near markers 2_52421397, 1_12538127 and 8_1387967, respectively (Table 5), are involved in glycolysis, a required process for energy production [78].PFP-ALFA, present in early maturity, participates in the breakdown of carbohydrates by catalyzing the reversible interconversion between fructose-6-phosphate and fructose-1,6-bisphosphate in a glycolysis intermediate [79].PHI1 catalyzes the reversible isomerization of glucose-6-phosphate to fructose-6-phosphate, the second reaction step of glycolysis [80], while HXK3, a transferase found at normal maturity, is involved in glucose phosphorylation to produce glucose-6-phosphate, important for glycolysis and the pentose phosphate pathway [81].The positive effect presented in the marker 8_13879677 near HXK3 (Table 4) suggests that the presence of this marker increases sucrose by storing glucose in the form of glucose-6-phosphate.
Transcription factors are proteins that bind to DNA to control genes in processes such as pentose-phosphate, glycolysis, and the metabolism of sugars and hormones [82].The present study identifies two candidate genes: the ethylene-responsive transcription factor ERF12 and the ethylene-responsive transcription factor ERF112 (Table 5).Both genes were found during the maturation phase (i.e., normal maturity), suggesting that they can generate an increase in the enzymes sucrose synthase (SUS), invertase (INV), and sucrose phosphate synthase (SPS) [83].It has been reported that an increase in the ethylene activity would be associated with the production of lignin and fiber, important compounds for source-sink regulation and sucrose accumulation [83].Finally, 16 markers were found associated with the accumulation of sucrose at early and normal maturity.These markers were close to genes that may be linked to the fundamental metabolic processes necessary for the production or accumulation of sucrose in any environment and at any developmental age.In particular, the genes Man-9GlcNAc2 alpha-1-2C3-glucosyltransferase (At5g38460), endoglucanase 9 (GLU1), bifunctional fucokinase/fucose pyrophosphorylase (FKGP), ERF112, TPS6, MST1, probable plastidic glucose transporter 1, ERD6-like 4, and SUT4 could be considered housekeeping genes because they encode enzymes necessary for basic cellular metabolism on an ongoing basis [84].For example, At5g38460 adds the first glucose residue to the lipid-linked oligosaccharide precursor for N-linked glycosylation, which is necessary for glycosylation and protein folding and their subsequent exit from the endoplasmic reticulum [85].GLU1 affects internode elongation and cell wall components [86], while FKGP is involved in the metabolic reactivation of fucose by salvage paths into NDP-sugars and by converting fucose into GDP-fucose to be substrates for the biosynthesis of wall polysaccharides [87].The SUT4, ERD6-like 4, and MST1 genes are sugar transporters and are generally considered synergistic genes because when sucrose reaches sink tissues, it is hydrolyzed into glucose and fructose (hexoses), which can be used for growth or storage through a sugar-coupled transporter (STP) [72].The genes found in this study show that sucrose accumulation involves multiple metabolic pathways, such as trehalose and sucrose starch metabolism, and/or biological processes, such as glycolysis, as well as different sugar transporters, which act synergistically for plant development and for the accumulation of sucrose.

Conclusions
In this study, the trait of sucrose concentration was dissected through a GWAS analysis under 12 genetic models (i.e., general, additive, and the dominant models from 1 to 5 dominant alleles) in a diverse sugarcane population of 220 genotypes.From the analysis, 16, 45, and 16 markers were found to be significantly associated with sucrose concentration at early maturity, normal maturity, and shared between both maturities, respectively.After candidate genes analysis, there were 82 genes within the LD region of only 39 markers with an annotated function involved with sucrose accumulation and/or production.For the remaining 38 markers, there was no annotated gene associated with the trait.Among the 82 candidate genes, 18 were highlighted because they were involved in sucrose hydrolysis (SUS6 and CIN3), sugar transport (i.e., MST1, MST2, PLT5, SUT4), phosphorylation processes (TPS genes), glycolysis (PFP-ALPHA and HXK3), and transcription factors (ERF12, and ERF112).These 39 markers will be helpful to further select favorable genetic resources for the sugarcane breeding process in Colombia.The highlight of this study is the genetic dissection of sucrose, a quantitative trait, in a decaploid organism and the identification of several molecular markers related to the accumulation or production of sucrose at different maturity phases.Finally, these results provide new insights into the molecular mechanisms involved in sucrose accumulation in sugarcane and contribute important genomic resources for future research on sucrose accumulation and/or production in humid environments in Colombia.

Experimental site
The experiment was planted in the humid environment of the Cauca River valley, Colombia, in fields of La Cabaña sugarcane mill located at 3° 10' 58.44" N and 76° 21' 7.599" E. The location had an agroecological zone 6H4, which is characterized by the presence of soils with high humidity (with excesses between 400 and 600 mm/ year), with a predominance of clayey soils, fine textures, and poorly aerated conditions [26].The experiment was conducted during the plant cane, first, and second ratoon.The field has a tropical climate with a total rainfall of 1669 mm, 1677.70 mm, and 1507.50 mm of accumulated precipitation for the plant cane, first, and second ratoon, respectively (Table 6).All harvests were done during the second rainy season of the Cauca River valley in the third week of September 2018 for the plant cane, October 2019 for the first ratoon, and October 2020 for the second ratoon.The average temperature ranged between 23.25 °C for the plant cane and 23.79 °C for the second ratoon, with a relative humidity above 80% for the 3 harvests (Table 6).

Plant material
Cenicaña's diverse panel of 220 genotypes from its sugarcane breeding program was selected for this study [88].This diverse panel contained 98 genotypes that represents the genetic diversity of Cenicaña's germplasm bank [89], 31 genotypes representatives from the wild species Saccharum officinarum, Saccharum barberi, Saccharum sinense, Saccharum spontaneum, and Erianthus spp., 58 genotypes of relevance for the breeding program at Cenicaña differential response to the most crop-limiting pests and diseases in Colombia, 33 genotypes belonging to genetic introductions from other breeding programs around the world, commercial varieties in Colombia, and early selection stage genotypes from the breeding program at Cenicaña [88] (Additional file 1).In total, there were 189 modern genotypes, of which 60% were genotypes bred under Cenicaña's breeding program (Additional file 1).

Experimental design and data collection
The 220 genotypes were planted under an alpha-lattice design with 3 replications.This design belongs to an incomplete block design (IBD), which is used primarily in the reduction of the experimental error by splitting the total field variability in small incomplete blocks [90][91][92][93], minimizing the unknown variation within each replication [90,91,94,95].This design has been widely used in different crops such as rice [92], barley [91], wheat [91], and bread wheat [96], achieving great control of the experimental error.To increase the presicion in the control of the random errors, the commercial checks S29, S64, and S177 were replicated 8, 7, and 8 times, respectively, within each replicate block, resulting in 240 experimental units per replicate.In this study, each replication contained 12 blocks with 20 experimental units.The experimental unit was a plot of five rows, each 5 m long, with 1.65 m between rows.Agronomic practices were applied following commercial practices implemented by the sugarcane mill.The sampling unit consisted of two rows, the third and fourth row of each plot, to avoid border effects.
Data was collected on a plot basis for the accumulation of sucrose (%Cane) at 10 and 13 months after planting (map), considered from here on as early and normal maturity, respectively.For early maturity, measurements were taken from the plant cane and first ratoon, while for normal maturity, measurements were taken from the plant cane, first, and second ratoon.Sucrose (%Cane) was measured at early maturity using the CeniAD method [97], while the direct analysis (DAC) method [28] was employed at normal maturity.The CeniAD method consists of the evaluation of sucrose in three internodes, one from the apical, one from the middle, and one from the basal part of 11 mature stalks randomly selected from each genotype.On the other hand, the DAC method evaluates the sucrose content in 11 mature stalks by shredding the complete stalk (node and internode) from each genotype.For both methods, samples were shredded, and the juice was extracted with a hydraulic press.The quantification of sucrose was performed in the extracted juice with a near infrared (NIR) spectroscopy methodology.

Data analysis
Data was analyzed by a combine analysis across crop cycle (i.e., plant cane, first ratoon, and second ratoon) [98].Within this type of models, some factors get nested within the experimental unit, which becomes in block factors innate to the units of the experiment [98].The statistical model used was as following: where Y ijkl corresponds to the sucrose content at early or normal maturity of the genotype l planted in the incomplete block k nested in replication j of crop cycle i.Similarly, C i corresponds to the effect of crop cycle i , R j(i) to the replication j nested within crop cycle i , B k(ij) to the incomplete block k nested in replication j and crop cycle i , G l to genotype l , I ij to the inter- action between genotype l and crop cycle i, and ijkl to the random residual [98].All effects were assumed to be random except for the crop cycle (C i ).For this experiment, the genotypes comprise a random and representative selection for the sugarcane breeding pool used in Cenicaña (Cenicaña´s sugarcane diverse panel [88]) and for that they were assumed to be random effects.
Outliers were detected by estimating the probability of obtaining a larger absolute value for each residual using a t-distribution [99,100].Subsequently, each p-value was adjusted with a Bonferroni correction at a 2% level of significance [101,102].After removing outliers, 16 models were evaluated by making all possible combinations between the random effects and by testing for homogeneity (V ( ijkl ) = σ 2 e ) or heterogeneity ( V ( ijkl ) = σ 2 e(i) ) of the residual variance.To identify the best fitting model, the Bayesian Information Criterion (BIC) [103] was used.The Bayesian information criterium (BIC) evaluates models in terms of their posterior probabilities, penalizing the models based on the number of parameters it includes [104][105][106], with higher penalty for the models that includes a higher number of parameters [107].Therefore, the lower the BIC value, the better the model balances the goodness of fit with parsimony (i.e., simplicity) [104][105][106].With the selected model, the best linear unbiased predictors (BLUP) for the genotypes were obtained for further analysis [108].Broad-sense heritability was calculated following Cullis heritability for unbalanced data, which takes into account the mean variance of the difference between two BLUPs and the genotypic variance [109].All analyses were carried out using the Proc Mixed procedure of SAS (SAS Institute, Cary, NC).

DNA extraction and sequencing
Genomic DNA was extracted from each of the 220 genotypes by following the phenol-chloroform protocol [110].The DNA concentration was determined with a Thermo Fisher ® Nanodrop 2000 spectrophotometer, while the integrity was verified using a 0.8% agarose gel.The genomic DNA was sequenced using Genotype-By-Sequencing (GBS) [111], Restriction-site Associated DNA Sequencing (RADSeq) [112], and Whole Genome Sequencing (WGS) strategies.DNA was digested with the restriction enzyme Pst I for the single-end GBS technique [111] and Eco RI for the paired-end RADseq technique [112].DNA libraries, for both techniques, were constructed following sequencing service providers and the sequence process was carried out using an Illumina HiSeq 2000 system (Illumina, San Diego, California, USA).For WGS, DNA libraries were constructed following Novogene sequence provider (Novogene Bioinformatics Technology Co. Ltd).Briefly, the genomic DNA was randomly sheared into short fragments (~ 150 bp).Each fragment was end-repaired, A-tailed, and ligated with Illumina adapters.The fragments with adapters were PCR amplified, size selected, purified, and sequenced using a System Illumina NovaSeq platform (HWI-ST1276).Raw reads quality was assessed for each strategy using cutadapt [113] and FastQC [114] by removing reads with a Phred score lower than 30.Cleaned data from each strategy were mapped to the CC 01-1940 monoploid reference genome [115] using bowtie 2.2.5 [116].Genotyping and variant detection were performed using the "MultisampleVariantsDetector" of NGSEP 4.0.2[117].SNP calling was performed through the "VCFFilter" module of NGSEP v 4.0.2[117], assuming a ploidy of 10, a minimum allelic frequency (i.e., MAF) of 1%, a calling rate of 75%, a minimum sequencing depth of 30X (at least 30 reads per position in the genome), a minimum genotyping quality of 30 on the Phred scale, a distance between markers of 1 bp, and keeping only biallelic markers.Finally, SNP markers called from GBS, RADSeq, and WGS were merged using SAMtools v. 1.10 [118], leaving a total of 137,889 SNPs for each one of the 220 genotypes.

Fig. 3
Fig. 3 Quantile-quantile (QQ) plots for sucrose accumulation (%Cane) for the general model with sequential removal of markers based on score (-log10 p-value) for early maturity.The black lines represent the theoretical expected values, and the gray shaded regions represent the 95% confidence interval.The green points represent the model with all SNP markers

Fig. 4 Fig. 6 Fig. 5
Fig. 4 Quantile-quantile (QQ) plots for sucrose accumulation (%Cane) for the general model with sequential removal of markers based on score (-log10 p-value) for normal maturity.The black lines represent the theoretical expected values, and the gray shaded regions represent the 95% confidence interval.The green points represent the model with all SNP markers

Table 1
Bayesian information criteria (BIC) values for each of the 16 possible models.All models contained the crop cycle as a fixed effect and genotype as a random effect, combined with the other random effects (denoted with an X) G = genotype, R = replication nested within the crop cycle, B = block nested within the replication and crop cycle, and I = genotype by crop cycle interaction

Table 4
Marker effect and pseudo R 2 per genetic model for the 77 SNPs significantly associated with sucrose accumulation at early (10 map) and normal (13 map) maturities

Table 5
Candidate genes in the LD region window (i.e., 500 Kb upstream and downstream) from 39 of the 77 markers associated with the accumulation and/or production of sucrose at early (10 map), normal(13 map), or shared between both maturities

Table 5
(continued)may be strongly regulated in response to some developmental and environmental signals, including senescence, pathogen attack, Carbon/Nitrogen starvation, and diurnal changes in transient sugar storage in the vacuole

Table 6
Climate data observed during the plant cane, first, and second ratoon of the experiment planted in the humid environment of the Cauca River valley in fields of La Cabaña sugarcane mill